在复杂的场景中,尤其是在城市交通交叉点,对实体关系和运动行为的深刻理解对于实现高质量的计划非常重要。我们提出了有关交通信号灯D2-Tpred的轨迹预测方法,该方法使用空间动态交互图(SDG)和行为依赖图(BDG)来处理空间空间中不连续依赖的问题。具体而言,SDG用于通过在每帧中具有动态和可变特征的不同试剂的子图来捕获空间相互作用。 BDG用于通过建模当前状态对先验行为的隐式依赖性来推断运动趋势,尤其是与加速度,减速或转向方向相对应的不连续运动。此外,我们提出了一个新的数据集,用于在称为VTP-TL的交通信号灯下进行车辆轨迹预测。我们的实验结果表明,与其他轨迹预测算法相比,我们的模型在ADE和FDE方面分别获得了{20.45%和20.78%}的改善。数据集和代码可在以下网址获得:https://github.com/vtp-tl/d2-tpred。
translated by 谷歌翻译
行人轨迹预测是自动驾驶的重要技术,近年来已成为研究热点。以前的方法主要依靠行人的位置关系来模型社交互动,这显然不足以代表实际情况中的复杂病例。此外,大多数现有工作通常通常将场景交互模块作为独立分支介绍,并在轨迹生成过程中嵌入社交交互功能,而不是同时执行社交交互和场景交互,这可能破坏轨迹预测的合理性。在本文中,我们提出了一个名为社会软关注图卷积网络(SSAGCN)的一个新的预测模型,旨在同时处理行人和环境之间的行人和场景相互作用之间的社交互动。详细说明,在建模社交互动时,我们提出了一种新的\ EMPH {社会软关注功能},其充分考虑了行人之间的各种交互因素。并且它可以基于各种情况下的不同因素来区分行人周围的人行力的影响。对于物理互动,我们提出了一个新的\ emph {顺序场景共享机制}。每个时刻在每个时刻对一个代理的影响可以通过社会柔和关注与其他邻居共享,因此场景的影响在空间和时间尺寸中都是扩展。在这些改进的帮助下,我们成功地获得了社会和身体上可接受的预测轨迹。公共可用数据集的实验证明了SSAGCN的有效性,并取得了最先进的结果。
translated by 谷歌翻译
风险的准确器官(OAR)分割对于减少治疗后并发症的放射治疗至关重要。达人指南推荐头部和颈部(H&N)区域的一套超过40桨的桨,然而,由于这项任务的可预测的禁止劳动力成本,大多数机构通过划定较小的桨子和忽视的少数,选择了大量简化的协议与其他桨相关的剂量分布。在这项工作中,我们提出了一种使用深度学习的新颖,自动化和高效的分层OAR分段(SOARS)系统,精确地描绘了一套全面的42 H&N OAR。 SOARS将42桨分层进入锚,中级和小型和硬质子类别,通过神经结构搜索(NAS)原则,专门为每个类别提供神经网络架构。我们在内在机构中使用176名培训患者建立了SOAR模型,并在六个不同的机构中独立评估了1327名外部患者。对于每个机构评估,它始终如一地表现出其他最先进的方法至少3-5%的骰子得分(在其他度量的相对误差减少36%)。更重要的是,广泛的多用户研究明显证明,98%的SOARE预测只需要非常轻微或没有直接临床验收的修订(节省90%的辐射脑神经工作负载),并且它们的分割和剂量准确度在于或小于帧 - 用户的变化。这些调查结果证实了H&N癌症放射疗法工作流OAR描绘过程的强烈临床适用性,提高了效率,全面性和质量。
translated by 谷歌翻译
Given a resource-rich source graph and a resource-scarce target graph, how can we effectively transfer knowledge across graphs and ensure a good generalization performance? In many high-impact domains (e.g., brain networks and molecular graphs), collecting and annotating data is prohibitively expensive and time-consuming, which makes domain adaptation an attractive option to alleviate the label scarcity issue. In light of this, the state-of-the-art methods focus on deriving domain-invariant graph representation that minimizes the domain discrepancy. However, it has recently been shown that a small domain discrepancy loss may not always guarantee a good generalization performance, especially in the presence of disparate graph structures and label distribution shifts. In this paper, we present TRANSNET, a generic learning framework for augmenting knowledge transfer across graphs. In particular, we introduce a novel notion named trinity signal that can naturally formulate various graph signals at different granularity (e.g., node attributes, edges, and subgraphs). With that, we further propose a domain unification module together with a trinity-signal mixup scheme to jointly minimize the domain discrepancy and augment the knowledge transfer across graphs. Finally, comprehensive empirical results show that TRANSNET outperforms all existing approaches on seven benchmark datasets by a significant margin.
translated by 谷歌翻译
在计算机视觉中学习快速和判别的补丁描述是一个具有挑战性的话题。最近,许多现有的作品通过最大程度地减少三胞胎损失(或其变体)来培训各种描述符学习网络,这有望降低每对的距离之间的距离并增加每对负对之间的距离。但是,由于网络优化器与本地解决方案的不完美收敛性,必须降低这种期望。解决这个问题和开放的计算速度问题,我们为本地描述符学习(称为Desdis)提出了一个描述剂蒸馏框架,该框架称为Desdis,其中学生模型从预先训练的教师模型中获得了知识,并通过设计的教师学生的规律规则来进一步增强。 。这个教师学生的正规化程序是为了限制教师模型的正(也是负)相似性与学生模型的相似性之间的差异,并且从理论上讲,我们可以证明可以通过最大程度地减少加权组合来培训更有效的学生模型三胞胎损失和这个正常工作,而不是通过单独使三胞胎损失最小化的老师。在拟议的desdis下,许多现有的描述符网络可以嵌入为教师模型,因此,可以得出同等重量和轻巧的学生模型,这可以以准确的或速度的速度优于他们的老师。 3个公共数据集的实验结果表明,通过利用三个典型的描述符学习网络作为教师模型,从拟议的DESDIS框架中得出的均等学生模型可以比其教师和其他几种比较方法取得更好的表现。此外,在类似的贴片验证性能下,派生的轻重量模型可以达到8次甚至更快的速度
translated by 谷歌翻译
在本文中,我们研究了在非静止环境中的多任务决策的代表学习。我们考虑顺序线性炸匪的框架,其中代理执行从与不同环境相关联的不同集合绘制的一系列任务。每个集合中任务的嵌入式共享一个名为表示表示的低维特征提取器,并且横跨集合不同。我们提出了一种在线算法,通过以自适应方式学习和转移非静止表示来促进有效的决策。我们证明我们的算法显着优于独立处理任务的现有问题。我们还使用合成和实际数据进行实验,以验证我们的理论见解并展示我们算法的功效。
translated by 谷歌翻译
将基于深学习视频编码已经吸引了大量的关注它的巨大潜力排挤视频序列的时空冗余。本文提出了一种高效的编解码器,即双路径生成对抗性的基于网络的视频编解码器(DGVC)。首先,我们提出了一个双通道的增强与生成对抗网络(DPEG)重建压缩视频的详细信息。所述DPEG由一个$ \阿尔法$自动编码器和卷积长短期记忆(ConvLSTM),它具有大的感受域和多帧的引用,和$ \测试$利于结构特征重构的-path - 残余关注块的路径,这有利于局部纹理特征的重建。两条路径融合,并通过生成对抗性的流程协同训练。其次,我们重用两个运动补偿和质量增强模块,这是与运动估计进一步结合DPEG网络,并在我们的DGVC框架熵编码模块。第三,我们采用深视频压缩和提高了联合训练,进一步提高率失真(RD)性能。与X265 LDP非常快的方式相比,我们的DGVC由39.39%/ 54.92%在相同的PSNR / MS-SSIM,其通过一个胜过国家的本领域深视频编解码器降低平均比特每像素(BPP)相当幅度。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译